***Replication Code for Graduate Student Satisfaction Survey with Anonymized Data***
*03/09/23*

***IMPORTANT***
***This version addresses a data cleaning error and an error in coding that affected the experimental analysis in the original article and replication data

*The data used in the article's analysis contains information that could identify individual survey takers. All of this information has
*been dropped from the publicly available anonymized dataset. This code assumes that you are using the publicly available anonymized
*dataset. Commands that require the full dataset are listed as comments here so that you can see what we did. 

***Why we drop specific variables***
*We dropped all open-ended answers because some responses contained identifying information.
*We dropped race because some categories had few people and individuals could be identified by race and, for example, harassment experience.
*We dropped gender because the trans category had few people and they could be identified with other pieces of data.
*We dropped university/program and rank because some had only a few respondents that could be identified with other pieces of data.
*We dropped we dropped user data including country of origin for privacy.
*We dropped age because some ages had only a few respondents that could be identified with other pieces of data.

*Full Data (after running data cleaning code)
*use "/Users/calla/Dropbox/Work/Papers/Working/Grad Student Survey/Data/GSSSReplicationData030923.dta"

*Make Anonymized Dataset*
*Drop all open ended questions
*Drop any variable where a small enough number of people answered that they could be identified with other variables
*age, race, ethnicity, university, gender, rank
*drop User Progress Durationinseconds PleasenoteyourageAge University Ifyouruniversitywasnotonth Whatisyourgenderidentityc
*drop J RacecheckallthatapplySe RacecheckallthatapplyOt Ethnicity Whatcountryareyoufrom Fieldofstudycheckallthata U Y 
*drop Ifyoufeelcomfortableplease Inwhatwayswasitunfair BI BO BR BV program gender POC race gender1 race1 subfield rank
*drop Whatdoesresearchfundinginyo Hasyourdepartmentprovidedext Isthereanysortofharassment Whatchoiceswouldyouhavemade

*save "/Users/calla/Dropbox/Work/Papers/Working/Grad Student Survey/Data/GSSSAnonymizedData121522.dta"


***Descriptive data***

*Methods section

*271 responses/205 complete
tab Finished

*49 universities, missing Rochester.
*tab program

*Table 1 Demographics: 
*Race
*tab race

*Gender
*tab gender

*Age
*tab PleasenoteyourageAge

*First gen
tab Areyouafirstgenerationstude

*LGBTQ+ 31%
tab DoyouidentifyasLGTBQ

*Unionized 45%
tab Doesyouruniversityhaveagrad

*Results section

*Funding
*42% always, 31% usually, 12% rarely/never
tab Doesyourdepartmentâsfinanci

*90% have 5+ years of gauranteed funding
tab Howmanyyearsoffundingdidyo

*58% have summer funding
tab Doesyourcontractguaranteesum

*Outside work: 31%
tab Haveyouhadtoworkoutsideyou

*Fair and transparent: 29%
tab Doyoufeelfundingdecisionsin

*Harassment
*Economic exploitation: 45/20% yes
tab Haveyouexperiencedexploitativ

*Sexual harassment: 19/9% yes
tab sexharass

*Racism: 40/19% yes
tab racism

*Homophobia: 13/6% yes
tab homophobia

*13% of LGBTQ people experienced homophobia
tab homophobia if DoyouidentifyasLGTBQ=="Yes"

*Table 2 descriptive statistics
*Labor:
*tab Haveyouexperiencedexploitativ if race == "White" & gender == "man" 
*tab Haveyouexperiencedexploitativ if race == "White" & gender == "woman" 
*tab Haveyouexperiencedexploitativ if gender == "trans" 
*tab Haveyouexperiencedexploitativ if race == "Asian" & gender == "woman" 
*tab Haveyouexperiencedexploitativ if race == "Latinx" & gender == "man"
*tab Haveyouexperiencedexploitativ if race == "Latinx" & gender == "woman" 
*tab Haveyouexperiencedexploitativ if POC == 1 & gender == "man" 
*tab Haveyouexperiencedexploitativ if POC == 1 & gender == "woman" 

*Sexual harassment
*tab sexharass if race == "White" & gender == "man" 
*tab sexharass if race == "White" & gender == "woman" 
*tab sexharass if gender == "trans" 
*tab sexharass if race == "Asian" & gender == "woman" 
*tab sexharass if race == "Latinx" & gender == "man" 
*tab sexharass if race == "Latinx" & gender == "woman" 
*tab sexharass if POC == 1 & gender == "man" 
*tab sexharass if POC == 1 & gender == "woman" 

*Racism
*tab racism if race == "White" & gender == "man" 
*tab racism if race == "White" & gender == "woman" 
*tab racism if gender == "trans" & top50==1
*tab racism if race == "Asian" & gender == "woman" 
*tab racism if race == "Latinx" & gender == "man" 
*tab racism if race == "Latinx" & gender == "woman" 
*tab racism if POC == 1 & gender == "man" 
*tab racism if POC == 1 & gender == "woman" 

*Homophobia
*tab homophobia if race == "White" & gender == "man" 
*tab homophobia if race == "White" & gender == "woman" 
*tab homophobia if gender == "trans" 
*tab homophobia if race == "Asian" & gender == "woman" 
*tab homophobia if race == "Latinx" & gender == "man" 
*tab homophobia if race == "Latinx" & gender == "woman" 
*tab homophobia if POC == 1 & gender == "man" 
*tab homophobia if POC == 1 & gender == "woman" 

*Any harassment
*gen harassment = 0
*replace harassment = 1 if sexharass == 2
*replace harassment = 1 if racism == 2
*replace harassment = 1 if homophobia == 2
*replace harassment = 1 if exploitation == 2
*29% report at least one form
tab harassment

*How likely are you to report?
tab Howlikelyareyoutoreportaf

*Experiment
oneway ReportMisconduct experiment, tabulate
pwmean ReportMisconduct, over(experiment) mcompare(tukey) effects

*would the dept do something if you filed a complaint?
tab Ifyoufileaharassmentcomplai
*Gendered and racialized responses
*tab Ifyoufileaharassmentcomplai gender 
*tab Ifyoufileaharassmentcomplai race 
*54% say yes, 17% no and 29% maybe
*tab Ifyoufileaharassmentcomplai if race == "White" & gender == "man"

*Heard of homophobia: 21%
tab BJ
*Experienced homophobia:
tab homophobia
*Experienced homophobia and LGBTQ:
tab homophobia if DoyouidentifyasLGTBQ=="Yes"

*Racism
*have you heard of racism: 59%
tab BP 

*have you? Yes 40/19%
tab racism

*45% of POC students
*tab racism if POC == 1

*Will your department do something if you file a complaint? Very white yeses
tab Ifyoufilearacialharassment 
*tab Ifyoufilearacialharassment race 
*tab Ifyoufileaharassmentcomplai if race == "White" & gender == "man"

*Advisor
summarize SatisfactionOnascaleofnot
*by gender and race
*tab SatisfactionOnascaleofnot gender
*tab SatisfactionOnascaleofnot race

*Satisfaction*
sum SatisfactionOnascaleofbur 

*Correlation matrix*
corr SatisfactionOnascaleofbur sexharass racism homophobia exploitation 

**Regressions**

*Descriptives (in Appendix 3)
summarize SatisfactionOnascaleofbur
summarize SatisfactionOnascaleofnot
summarize money
summarize research   

summarize exploitation 
summarize sexharass 
summarize homophobia 
summarize racism

*Model 1: basic satisfaction
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status

*Model 2: satisfaction with discrimination variables
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism

*note: money1 and research1 are ordinal variables with the categories in ascending order. oney and research are ordinal with categories jumbled.


***Appendices***

*Appendix 2: Descriptives*
*271 responses/205 complete
tab Finished

*49 universities, missing Rochester.
*tab program

*Table 1 Demographics: 
*Race
*tab race

*Gender
*tab gender

*Age
*tab PleasenoteyourageAge

*First gen
tab Areyouafirstgenerationstude

*LGBTQ+ 31%
tab DoyouidentifyasLGTBQ

*Unionized 45%
tab Doesyouruniversityhaveagrad

*Kids: 8%
tab kids 

*International student: 29%
tab international 

*ABD: 53%
tab Statusingraduateprogram

*Appendix 3: Analyses

*Descriptives on variables in the models Table A3.2
summarize SatisfactionOnascaleofbur
summarize SatisfactionOnascaleofnot
summarize money1
summarize research1   

*Descriptives on variables in the models Table A3.3
summarize exploitation 
summarize sexharass 
summarize homophobia 
summarize racism

*Main models reproduced 
*Model 1: basic satisfaction
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status

*Model 2: satisfaction with discrimination variables
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism

*Ordered logistic check:
*Model 3
*ologit SatisfactionOnascaleofbur SatisfactionOnascaleofnot i.money i.research i.gender1 ib6.race1 i.status
*Model 4
*ologit SatisfactionOnascaleofbur SatisfactionOnascaleofnot i.money i.research i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism


*Full controls
*Model 5 base model
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status
*Model 6 harassment model
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism
*Model 7 add department/university level variables 
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism rank i.union i.public
*model 8 add additional demographic variables
*regress SatisfactionOnascaleofbur SatisfactionOnascaleofnot money1 research1 i.gender1 ib6.race1 i.status i.exploitation i.sexharass i.racism i.kids i.international i.firstgen i.lgbt PleasenoteyourageAge

*Harassment models

*Sexual Harassment
*Model 9: gender, exploitation, and international student status are significant
*ologit sexharass i.gender1 rank i.exploitation i.racism i.international 

*Racism
*Model 10: race and exploitation are significant
*ologit racism ib6.race1 i.gender1 i.exploitation i.sexharass i.international

*Exploitation
*Model 11: money, status, and other forms harassment are significant
*ologit exploitation i.money rank i.status i.gender1 ib6.race1 i.sexharass i.racism

*Funding
*ologit money1 i.public i.status i.gender1 ib6.race1  ib5.source

*Experiment
oneway ReportMisconduct experiment, tabulate
pwmean ReportMisconduct, over(experiment) mcompare(tukey) effects
